An open diachronic corpus of historical Spanish
نویسندگان
چکیده
منابع مشابه
An open diachronic corpus of historical Spanish
The impact-es diachronic corpus of historical Spanish compiles over one hundred books —containing approximately 8 million words— in addition to a complementary lexicon which links more than 10 thousand lemmas with attestations of the different variants found in the documents. This textual corpus and the accompanying lexicon have been released under an open license (Creative Commons by-nc-sa) in...
متن کاملAn open diachronic corpus of historical Spanish: annotation criteria and automatic modernisation of spelling
The impact-es diachronic corpus of historical Spanish compiles over one hundred books —containing approximately 8 million words— in addition to a complementary lexicon which links more than 10 thousand lemmas with attestations of the different variants found in the documents. This textual corpus and the accompanying lexicon have been released under an open license (Creative Commons by-nc-sa) in...
متن کاملLinguistically-Enhanced Search over an Open Diachronic Corpus
The BVC section of the impact-es diachronic corpus of historical Spanish compiles 86 books —containing approximately 2 million words. About 27% of the words —providing a representative coverage of the most frequent word forms— have been annotated with their lemma, part of speech, and modern equivalent following the Text Encoding Initiative guidelines. We describe how this type of annotation can...
متن کاملAnnotation and Representation of a Diachronic Corpus of Spanish
In this article we describe two different strategies for the automatic tagging of a Spanish diachronic corpus involving the adaptation of existing NLP tools developed for modern Spanish. In the initial approach we follow a state-of-the-art strategy, which consists on standardizing the spelling and the lexicon. This approach boosts POS-tagging accuracy to 90, which represents a raw improvement o...
متن کاملMachine Translation between Language Stages: Extracting Historical Grammar from a Parallel Diachronic Corpus of Polish
This paper explores methods for the extrapolation of correspondences in a small parallel diachronic corpus taken from the Modern and Middle Polish Bible, in an attempt to answer the question “can historical grammar and lexica be derived directly from a corpus?” The problem of extracting this data is approached from a machine translation point of view: by envisioning texts from different periods...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Language Resources and Evaluation
سال: 2013
ISSN: 1574-020X,1574-0218
DOI: 10.1007/s10579-013-9239-y